460 research outputs found

    Combining Spectral Representations for Large Vocabulary Continuous Speech Recognition

    Get PDF
    In this paper we investigate the combination of complementary acoustic feature streams in large vocabulary continuous speech recognition (LVCSR). We have explored the use of acoustic features obtained using a pitch-synchronous analysis, STRAIGHT, in combination with conventional features such as mel frequency cepstral coefficients. Pitch-synchronous acoustic features are of particular interest when used with vocal tract length normalisation (VTLN) which is known to be affected by the fundamental frequency. We have combined these spectral representations directly at the acoustic feature level using heteroscedastic linear discriminant analysis (HLDA) and at the system level using ROVER. We evaluated this approach on three LVCSR tasks: dictated newspaper text (WSJCAM0), conversational telephone speech (CTS), and multiparty meeting transcription. The CTS and meeting transcription experiments were both evaluated using standard NIST test sets and evaluation protocols. Our results indicate that combining conventional and pitch-synchronous acoustic feature sets using HLDA results in a consistent, significant decrease in word error rate across all three tasks. Combining at the system level using ROVER resulted in a further significant decrease in word error rate

    Pitch adaptive features for LVCSR

    Get PDF
    We have investigated the use of a pitch adaptive spectral representation on large vocabulary speech recognition, in conjunction with speaker normalisation techniques. We have compared the effect of a smoothed spectrogram to the pitch adaptive spectral analysis by decoupling these two components of STRAIGHT. Experiments performed on a large vocabulary meeting speech recognition task highlight the importance of combining a pitch adaptive spectral representation with a conventional fixed window spectral analysis. We found evidence that STRAIGHT pitch adaptive features are more speaker independent than conventional MFCCs without pitch adaptation, thus they also provide better performances when combined using feature combination techniques such as Heteroscedastic Linear Discriminant Analysis

    Industrial Landscapes Between Environmental Sustainability and Landscape Constraints: The Case Study of Euralluminia in the Sulcis Area of Sardinia (Italy)

    Get PDF
    In Italy, industrialization had a remarkable development in the 1950s and 1960s, and aimed with priority of ensuring economic growth and development. The location of the industrial complexes was determined by the dynamics of the production that required a territory equipped to supply specific infrastructures such as water connections, sewers, gas pipelines and the electricity grid, and above all areas where to build transport terminals capable of mitigating the costs of handling the product. This led Italy to locate industrial activities in many coastal sites, close to pre-existing urban contexts, resulting in a well-defined coastal industrial landscape especially in the areas of Southern Italy that were chosen as centers of development. Today, the determining factor for location choices is the cost of the workforce and this has made more and more frequent the processes of delocalization of the companies with worrying repercussions both for the direct and induced occupation and for the degradation of the landscape. This process, linked to the safety regulations, to the updating of the systems and to an increasingly more rigorous landscape legislation, makes critical the framework of the existing and not yet abandoned disused industrial realities. For these reasons, the main objective of this article is to evaluate the compatibility between existing industrial areas at risk of delocalization and new interpretations of the environment and the landscape to be reconstituted, in order to allow the realization of goods that maintain the levels of industrial production within a framework ofecological protection rules and recently adopted landscape constraints. In this regard, in this paper the authors use the Eurallumina industry in Sulcis in Sardinia (Italy) as a case study, in order to analyze the problem that concerns the uses in the territories with an industrial vocation and the landscape components, that deserve particular attention to safeguard not only for the economic and social context but also for the quality of the coastal environment. The case study is particularly significant because the Euralluminia industry for some years was at risk of delocalization because it needs of a conversion of some parts of the plants, blocked due to the landscape regulation imposed by the Superintendence of Cultural Heritage ofSouthern Sardinia for the expected changes in the coastal environment. Therefore, keeping in mind the theories of localization and the pushes for the delocalization of the industrial contexts, the study discusses the importance of the interconnection between economic and landscape factors paying particular attention to the coastal areas

    Speaker normalisation for large vocabulary multiparty conversational speech recognition

    Get PDF
    One of the main problems faced by automatic speech recognition is the variability of the testing conditions. This is due both to the acoustic conditions (different transmission channels, recording devices, noises etc.) and to the variability of speech across different speakers (i.e. due to different accents, coarticulation of phonemes and different vocal tract characteristics). Vocal tract length normalisation (VTLN) aims at normalising the acoustic signal, making it independent from the vocal tract length. This is done by a speaker specific warping of the frequency axis parameterised through a warping factor. In this thesis the application of VTLN to multiparty conversational speech was investigated focusing on the meeting domain. This is a challenging task showing a great variability of the speech acoustics both across different speakers and across time for a given speaker. VTL, the distance between the lips and the glottis, varies over time. We observed that the warping factors estimated using Maximum Likelihood seem to be context dependent: appearing to be influenced by the current conversational partner and being correlated with the behaviour of formant positions and the pitch. This is because VTL also influences the frequency of vibration of the vocal cords and thus the pitch. In this thesis we also investigated pitch-adaptive acoustic features with the goal of further improving the speaker normalisation provided by VTLN. We explored the use of acoustic features obtained using a pitch-adaptive analysis in combination with conventional features such as Mel frequency cepstral coefficients. These spectral representations were combined both at the acoustic feature level using heteroscedastic linear discriminant analysis (HLDA), and at the system level using ROVER. We evaluated this approach on a challenging large vocabulary speech recognition task: multiparty meeting transcription. We found that VTLN benefits the most from pitch-adaptive features. Our experiments also suggested that combining conventional and pitch-adaptive acoustic features using HLDA results in a consistent, significant decrease in the word error rate across all the tasks. Combining at the system level using ROVER resulted in a further significant improvement. Further experiments compared the use of pitch adaptive spectral representation with the adoption of a smoothed spectrogram for the extraction of cepstral coefficients. It was found that pitch adaptive spectral analysis, providing a representation which is less affected by pitch artefacts (especially for high pitched speakers), delivers features with an improved speaker independence. Furthermore this has also shown to be advantageous when HLDA is applied. The combination of a pitch adaptive spectral representation and VTLN based speaker normalisation in the context of LVCSR for multiparty conversational speech led to more speaker independent acoustic models improving the overall recognition performances

    Applying Vocal Tract Length Normalization to Meeting Recordings

    Get PDF
    Vocal Tract Length Normalisation (VTLN) is a commonly used technique to normalise for inter-speaker variability. It is based on the speaker-specific warping of the frequency axis, parameterised by a scalar warp factor. This factor is typically estimated using maximum likelihood. We discuss how VTLN may be applied to multiparty conversations, reporting a substantial decrease in word error rate in experiments using the ICSI meetings corpus. We investigate the behaviour of the VTLN warping factor and show that a stable estimate is not obtained. Instead it appears to be influenced by the context of the meeting, in particular the current conversational partner. These results are consistent with predictions made by the psycholinguistic interactive alignment account of dialogue, when applied at the acoustic and phonological levels

    Floor Holder Detection and End of Speaker Turn Prediction in Meetings

    Get PDF
    We propose a novel fully automatic framework to detect which meeting participant is currently holding the conversational floor and when the current speaker turn is going to finish. Two sets of experiments were conducted on a large collection of multiparty conversations: the AMI meeting corpus. Unsupervised speaker turn detection was performed by post-processing the speaker diarization and the speech activity detection outputs. A supervised end-of-speaker-turn prediction framework, based on Dynamic Bayesian Networks and automatically extracted multimodal features (related to prosody, overlapping speech, and visual motion), was also investigated. These novel approaches resulted in good floor holder detection rates (13:2% Floor Error Rate), attaining state of the art end-of-speaker-turn prediction performances

    Grafting of the 2,8-dithia-5-aza-2,6-pyridinophane macrocycle on SBA-15 mesoporous silica for the removal of Cu2+ and Cd2+ ions from aqueous solutions: synthesis, adsorption, and complex stability studies

    Get PDF
    Silica-based mesoporous materials have received growing attention in metal recovery from industrial processes, although, in general, the adsorption of metal ions by silanols is rather poor. Nevertheless, a great improvement of metal ion removal from aqueous solutions can be achieved by grafting metal-chelators on the particles’ surface. Combining the metal-chelating properties of organic ligands with the high surface area of mesoporous silica particles makes these hybrid nanostructured materials a new horizon in metal recovery, sensing and controlled storage of metal ions in industrial and mining processes. Here, the 2,8-dithia-5-aza-2,6-pyridinophane (L) macrocycle was grafted on SBA-15 mesoporous silica to obtain the SBA-L mesoporous adsorbent for the removal and controlled recovery of Cd2+ and Cu2+ ions from aqueous solution in a broad pH range (4-11). By grafting about 0.3 mmol g−1 of L on SBA-15 a maximum loading capacity of 20.9 mg g−1 and 31.8 mg g−1 was obtained for Cu2+ and Cd2+, respectively. The adsorption kinetics can be described with the pseudo-second order model, while the adsorption isotherm (298 K) followed the Langmuir model. The latter, together with potentiometric studies, suggests that the adsorption mechanism is based on metal chelation by the grafted macrocycle. In summary, SBA-L is an effective copper(ii) and cadmium(ii) chelator for possible applications where metal removal, storage and recovery are of basic importance

    Predictive and motivational factors influencing anticipatory contrast: A comparison of contextual and gustatory predictors in food restricted and free-fed rats

    Get PDF
    In anticipation of palatable food, rats can learn to restrict consumption of a less rewarding food type resulting in an increased consumption of the preferred food when it is made available. This construct is known as anticipatory negative contrast (ANC) and can help elucidate the processes that underlie binge-like behavior as well as self-control in rodent motivation models. In the current investigation we aimed to shed light on the ability of distinct predictors of a preferred food choice to generate contrast effects and the motivational processes that underlie this behavior. Using a novel set of rewarding solutions, we directly compared contextual and gustatory ANC predictors in both food restricted and free-fed Sprague-Dawley rats. Our results indicate that, despite being food restricted, rats are selective in their eating behavior and show strong contextually-driven ANC similar to free-fed animals. These differences mirrored changes in palatability for the less preferred solution across the different sessions as measured by lick microstructure analysis. In contrast to previous research, predictive cues in both food restricted and free-fed rats were sufficient for ANC to develop although flavor-driven ANC did not relate to a corresponding change in lick patterning. These differences in the lick microstructure between context- and flavor-driven ANC indicate that the motivational processes underlying ANC generated by the two predictor types are distinct. Moreover, an increase in premature port entries to the unavailable sipper – a second measure of ANC – in all groups reveals a direct influence of response competition on ANC development

    Transcription of conference room meetings: an investigation

    Get PDF
    The automatic processing of speech collected in conference style meetings has attracted considerable interest with several large scale projects devoted to this area. In this paper we explore the use of various meeting corpora for the purpose of automatic speech recognition. In particular we investigate the similarity of these resources and how to efficiently use them in the construction of a meeting transcription system. The analysis shows distinctive features for each resource. However the benefit in pooling data and hence the similarity seems sufficient to speak of a generic conference meeting domain . In this context this paper also presents work on development for the AMI meeting transcription system, a joint effort by seven sites working on the AMI (augmented multi-party interaction) project
    • …
    corecore